Untitled Document

Queuing models

Distribution of request inter-arrival times

The problem: Let us assume that we want to model a system consisting of many users and a server, and we are interested in the response time of the server which is the time between the moment that a user sends a request to the server to the moment when the server has completed its answer (we ignore any communication delays).

Now, let us assume that the service time of the server is a constant (1 second), that is, it always takes exactly 1 second to provide an answer to a request. This means the distribution of the service time is a delta-function with a peak at t = 1 second.

If the users organize themselves in such a way that the next request will be sent to the server only after the last answer was received, then the response time will always be equal to the service time. However, if the users do not coordinate their requests, it may happen that a user sends a request before the last answer was produced. This means that the server is busy when the request arrives and the request has to wait (in some kind of input queue) before it will be processed. Therefore the response time, in this case, will be larger than the service time (service time plus waiting time). We see that the arrival pattern of the request has an impact on the average response time of the system.

Poisson inter-arrival pattern: It can be shown that, if there are very many users without any coordination among them, the inter-arrival time (that is, the time between the arrival of one request up to the arrival of the next request) has an exponential distribution of the form Pt = alpha * exp(- alpha * t) where alpha is the arrival rate. In fact, the average of this inter-arrival distribution is 1/alpha which is the average time between two consecutive requests. We see that in this case there is a good probability that the inter-arrival time is much smaller than the average. Therefore it would be interesting to determine what the average waiting time of requests in the server input queue would be. An answer is given by the queuing models.

Important conclusion: One cannot say what the response time of a server is without knowing the inter-arrival pattern of the requests. The performance guarantee of the server depends on the hypothesis about its environment (which determines the distribution of incoming requests).

Queuing models

In the simplest case, a queuing model represents a system where service requests arrive randomly (Poisson arrival process - with an exponential inter-arrival time distribution) and a shared resource is needed for performing the service; if the resource is occupied serving one request, subsequent requests must wait in a queue; the time the resource is required for serving a request (the service time) is supposed to be known (either fixed value or a given time distribution). Simple analytical formulas exist that may be used to calculate the average waiting time in the queue, the total time a service request remains in the system (this is the response time), the average length of the queue, etc. (for an overview, see "slides on Queuing Models").

For the case of a single server with a queue and with Poisson arrival and exponential service time (so-called M/M/1 queue), we have the following formulas:

average time spent in the system (service plus waiting time) Tr = Ts / (1- ρ) where Ts is the average service time and ρ is the utilization (average service time over average inter-arrival time = percentage of time that the server is in use);
average number of requests in the system (waiting plus being served) r = ρ / (1- ρ)
average number of requests waiting in the queue w = ρ**2 / (1- ρ)

The factor 1 / (1- ρ) is typical of most queuing formulas. It shows that the response time and queue length blow up when the service time approaches the inter-arrival time. If it reaches the inter-arrival time or becomes larger, the system is completely overloaded and the average queue lengths increases without bound.

Assumptions: The main assumptions for the formulas of the queuing models are the following (they not always satisfied - but the formulas often still provide good approximations): (a) independent arrival of individual requests (Poison arrival) - not satisfied when bursty arrivals (partly self-similar) are considered (see second part of "slides on Queuing Models"), or arrival at regular intervals; (b) distribution of service times may not be exponential - different other distributions can be considered - exact formulas exists for several of them. If the simplifying assumptions are not satisfied, one often uses simulations to study the performance of a given system (see comments below).

Example: Assume an inter-arrival time equal to 90% of the service time, and a constant service time. If the requests arrive in regular time intervals, each request will be processed before the next one arrives - there will be no queuing delay. If the requests arrive independently (according to a Poisson arrival process), the the above formula shows that the response time will be 10 times larger than the service time - a queuing delay 9 times larger than the service time.

Simulation programs: (a) one thread per active object - or (b) a program with future event list

Simulation models: When queuing models are not easily applicable (because the underlying assumptions are not met - or the system is more complicated), one may build simulation models. They can be quite detailed, depending on what aspects are important for the modeling. However, they may also require much CPU power for obtaining statistically valid results.

There are basically two approaches to writing a simulation program:

(Intuitive) The program contains multiple threads (one for each active object being simulated) - each thread will execute a sleep operation whenever the simulated object performs some actions that take some time in real life (but do not need to be simulated in detail).
(Professional) The simulation program is a sequential program (a single thread) using a queue of future events - the events are ordered according to the future time when they should be performed.

These two approaches use two very different programming structures. Here are some comments:

The first approach uses a structure where actions are performed by threads, and when an action takes some time the thread will call the sleep primitive to obtain a corresponding delay in real time. The real time of the program execution corresponds to the real time in the modeled system, up to a certain factor. So one second execution time may correspond to a one hour time period in the real (simulated) world. The simulation program must be such that the execution of the actions by the threads correspond to the scheduling of actions in the simulated world. Semaphores may be used to control the access to shared resources with mutual exclusion.
In the second approach, there is no relationship between the real-time of the execution of the simulation program and the time of the simulated system (called simulated time). The simulated time is stored in a variable (sometime called now) and increased from its current value when no events exists in the future event list that should be processed during the current value of the simulated time. If there is no such event, then the now variable is incremented to take the value of the simulated time when the next event in the future event list should be processed. Since there are no concurrent threads in the simulation program, there are no problems of concurrency. One can use normal variables to store the status of resources, since each action in the simulated system is associated with an event, and this action is executed in mutual exclusion with any other event of the simulation.
In both programs, special attention must be given to the collection of statistical information in order to obtain enough information about the performance of the simulated system.

Copied from SEG-2106 course notes: February 22, 2013